Recognizing Nested Named Entities in GENIA corpus
نویسنده
چکیده
Nested Named Entities (nested NEs), one containing another, are commonly seen in biomedical text, e.g., accounting for 16.7% of all named entities in GENIA corpus. While many works have been done in recognizing non-nested NEs, nested NEs have been largely neglected. In this work, we treat the task as a binary classification problem and solve it using Support Vector Machines. For each token in nested NEs, we use two schemes to set its class label: labeling as the outmost entity or the inner entity. Our preliminary results show that while the outmost labeling tends to work better in recognizing the outmost entities, the inner labeling recognizes the inner NEs better. This result should be useful for recognition of nested NEs.
منابع مشابه
Nested Named Entity Recognition
Many named entities contain other named entities inside them. Despite this fact, the field of named entity recognition has almost entirely ignored nested named entity recognition, but due to technological, rather than ideological reasons. In this paper, we present a new technique for recognizing nested named entities, by using a discriminative constituency parser. To train the model, we transfo...
متن کاملپیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی
Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...
متن کاملUsing name-internal and contextual features to classify biological terms
There has been considerable work done recently in recognizing named entities in biomedical text. In this paper, we investigate the named entity classification task, an integral part of the named entity extraction task. We focus on the different sources of information that can be utilized for classification, and note the extent to which they are effective in classification. To classify a name, w...
متن کاملA Distributional Semantics Approach to Simultaneous Recognition of Multiple Classes of Named Entities
Named Entity Recognition and Classification is being studied for last two decades. Since semantic features take huge amount of training time and are slow in inference, the existing tools apply features and rules mainly at the word level or use lexicons. Recent advances in distributional semantics allow us to efficiently create paradigmatic models that encode word order. We used Sahlgren et al’s...
متن کاملTuning support vector machines for biomedical named entity recognition
We explore the use of Support Vector Machines (SVMs) for biomedical named entity recognition. To make the SVM training with the available largest corpus – the GENIA corpus – tractable, we propose to split the non-entity class into sub-classes, using part-of-speech information. In addition, we explore new features such as word cache and the states of an HMM trained by unsupervised learning. Expe...
متن کامل